Despite excellent performance in image generation, Generative Adversarial Networks (GANs) are notorious for its requirements of enormous storage and intensive computation. As an awesome ''performance maker'', knowledge distillation is demonstrated to be particularly efficacious in exploring low-priced GANs. In this paper, we investigate the irreplaceability of teacher discriminator and present an inventive discriminator-cooperated distillation, abbreviated as DCD, towards refining better feature maps from the generator. In contrast to conventional pixel-to-pixel match methods in feature map distillation, our DCD utilizes teacher discriminator as a transformation to drive intermediate results of the student generator to be perceptually close to corresponding outputs of the teacher generator. Furthermore, in order to mitigate mode collapse in GAN compression, we construct a collaborative adversarial training paradigm where the teacher discriminator is from scratch established to co-train with student generator in company with our DCD. Our DCD shows superior results compared with existing GAN compression methods. For instance, after reducing over 40x MACs and 80x parameters of CycleGAN, we well decrease FID metric from 61.53 to 48.24 while the current SoTA method merely has 51.92. This work's source code has been made accessible at https://github.com/poopit/DCD-official.
translated by 谷歌翻译
CutMix is a vital augmentation strategy that determines the performance and generalization ability of vision transformers (ViTs). However, the inconsistency between the mixed images and the corresponding labels harms its efficacy. Existing CutMix variants tackle this problem by generating more consistent mixed images or more precise mixed labels, but inevitably introduce heavy training overhead or require extra information, undermining ease of use. To this end, we propose an efficient and effective Self-Motivated image Mixing method (SMMix), which motivates both image and label enhancement by the model under training itself. Specifically, we propose a max-min attention region mixing approach that enriches the attention-focused objects in the mixed images. Then, we introduce a fine-grained label assignment technique that co-trains the output tokens of mixed images with fine-grained supervision. Moreover, we devise a novel feature consistency constraint to align features from mixed and unmixed images. Due to the subtle designs of the self-motivated paradigm, our SMMix is significant in its smaller training overhead and better performance than other CutMix variants. In particular, SMMix improves the accuracy of DeiT-T/S, CaiT-XXS-24/36, and PVT-T/S/M/L by more than +1% on ImageNet-1k. The generalization capability of our method is also demonstrated on downstream tasks and out-of-distribution datasets. Code of this project is available at https://github.com/ChenMnZ/SMMix.
translated by 谷歌翻译
This paper proposes a content relationship distillation (CRD) to tackle the over-parameterized generative adversarial networks (GANs) for the serviceability in cutting-edge devices. In contrast to traditional instance-level distillation, we design a novel GAN compression oriented knowledge by slicing the contents of teacher outputs into multiple fine-grained granularities, such as row/column strips (global information) and image patches (local information), modeling the relationships among them, such as pairwise distance and triplet-wise angle, and encouraging the student to capture these relationships within its output contents. Built upon our proposed content-level distillation, we also deploy an online teacher discriminator, which keeps updating when co-trained with the teacher generator and keeps freezing when co-trained with the student generator for better adversarial training. We perform extensive experiments on three benchmark datasets, the results of which show that our CRD reaches the most complexity reduction on GANs while obtaining the best performance in comparison with existing methods. For example, we reduce MACs of CycleGAN by around 40x and parameters by over 80x, meanwhile, 46.61 FIDs are obtained compared with these of 51.92 for the current state-of-the-art. Code of this project is available at https://github.com/TheKernelZ/CRD.
translated by 谷歌翻译
Most shadow removal methods rely on the invasion of training images associated with laborious and lavish shadow region annotations, leading to the increasing popularity of shadow image synthesis. However, the poor performance also stems from these synthesized images since they are often shadow-inauthentic and details-impaired. In this paper, we present a novel generation framework, referred to as HQSS, for high-quality pseudo shadow image synthesis. The given image is first decoupled into a shadow region identity and a non-shadow region identity. HQSS employs a shadow feature encoder and a generator to synthesize pseudo images. Specifically, the encoder extracts the shadow feature of a region identity which is then paired with another region identity to serve as the generator input to synthesize a pseudo image. The pseudo image is expected to have the shadow feature as its input shadow feature and as well as a real-like image detail as its input region identity. To fulfill this goal, we design three learning objectives. When the shadow feature and input region identity are from the same region identity, we propose a self-reconstruction loss that guides the generator to reconstruct an identical pseudo image as its input. When the shadow feature and input region identity are from different identities, we introduce an inter-reconstruction loss and a cycle-reconstruction loss to make sure that shadow characteristics and detail information can be well retained in the synthesized images. Our HQSS is observed to outperform the state-of-the-art methods on ISTD dataset, Video Shadow Removal dataset, and SRD dataset. The code is available at https://github.com/zysxmu/HQSS.
translated by 谷歌翻译
本文着重于当前过度参数化的阴影去除模型的局限性。我们提出了一个新颖的轻型深神经网络,该网络在实验室色彩空间中处理阴影图像。提出的称为“实验室网络”的网络是由以下三个观察结果激励的:首先,实验室颜色空间可以很好地分离亮度信息和颜色属性。其次,顺序堆叠的卷积层无法完全使用来自不同接受场的特征。第三,非阴影区域是重要的先验知识,可以减少阴影和非阴影区域之间的剧烈差异。因此,我们通过涉及两个分支结构的结构来设计实验室网络:L和AB分支。因此,与阴影相关的亮度信息可以很好地处理在L分支中,而颜色属性则很好地保留在AB分支中。此外,每个分支由几个基本块,局部空间注意模块(LSA)和卷积过滤器组成。每个基本块由多个平行的扩张扩张率的扩张卷积组成,以接收不同的接收场,这些接收场具有不同的网络宽度,以节省模型参数和计算成本。然后,构建了增强的通道注意模块(ECA),以从不同的接受场聚集特征,以更好地去除阴影。最后,进一步开发了LSA模块,以充分利用非阴影区域中的先前信息来清洁阴影区域。我们在ISTD和SRD数据集上执行广泛的实验。实验结果表明,我们的实验室网络井胜过最先进的方法。同样,我们的模型参数和计算成本降低了几个数量级。我们的代码可在https://github.com/ngrxmu/lab-net上找到。
translated by 谷歌翻译
知识蒸馏(KD)将知识从高容量的教师网络转移到加强较小的学生。现有方法着重于发掘知识的提示,并将整个知识转移给学生。但是,由于知识在不同的学习阶段显示出对学生的价值观,因此出现了知识冗余。在本文中,我们提出了知识冷凝蒸馏(KCD)。具体而言,每个样本上的知识价值是动态估计的,基于期望最大化(EM)框架的迭代性凝结,从老师那里划定了一个紧凑的知识,以指导学生学习。我们的方法很容易建立在现成的KD方法之上,没有额外的培训参数和可忽略不计的计算开销。因此,它为KD提出了一种新的观点,在该观点中,积极地识别教师知识的学生可以学会更有效,有效地学习。对标准基准测试的实验表明,提出的KCD可以很好地提高学生模型的性能,甚至更高的蒸馏效率。代码可在https://github.com/dzy3/kcd上找到。
translated by 谷歌翻译
通过强迫连续重量的最多n非零,最近的N:M网络稀疏性因其两个有吸引力的优势而受到越来越多的关注:1)高稀疏性的有希望的表现。 2)对NVIDIA A100 GPU的显着加速。最近的研究需要昂贵的训练阶段或重型梯度计算。在本文中,我们表明N:M学习可以自然地将其描述为一个组合问题,该问题可以在有限的集合中寻找最佳组合候选者。由这种特征激励,我们以有效的分裂方式解决了n:m的稀疏性。首先,我们将重量向量分为$ c _ {\ text {m}}}^{\ text {n}} $组合s子集的固定大小N。然后,我们通过分配每个组合来征服组合问题,一个可学习的分数是共同优化了其关联权重。我们证明,引入的评分机制可以很好地模拟组合子集之间的相对重要性。通过逐渐去除低得分的子集,可以在正常训练阶段有效地优化N:M细粒稀疏性。全面的实验表明,我们的学习最佳组合(LBC)的表现始终如一,始终如一地比现成的N:m稀疏方法更好。我们的代码在\ url {https://github.com/zyxxmu/lbc}上发布。
translated by 谷歌翻译
在许多收集的图像中,由于未经污染的图像对于许多下游多媒体任务至关重要,因此阴影删除引起了人们的关注。当前的方法考虑了阴影和非阴影区域的相同卷积操作,同时忽略了阴影区域和非阴影区域的颜色映射之间的巨大差距,从而导致重建图像的质量差和沉重的计算负担。为了解决这个问题,本文介绍了一个新颖的插件阴影感知动态卷积(SADC)模块,以使阴影区域与非阴影区域之间的相互依赖性解除。受到以下事实的启发:非阴影区域的颜色映射更易于学习,我们的SDC以计算上的轻巧卷积模块的方式处理非阴影区域,并以计算上的廉价方式处理,并使用更复杂的卷积模块恢复阴影区域图像重建的质量。鉴于非阴影区域通常包含更多背景颜色信息,我们进一步开发了一种新型的卷积内蒸馏损失,以增强从非阴影区域到阴影区域的信息流。在ISTD和SRD数据集上进行的广泛实验表明,我们的方法在许多最先进的情况下取得了更好的阴影去除性能。我们的代码可从https://github.com/xuyimin0926/sadc获得。
translated by 谷歌翻译
在本文中,我们提出了一个简单而通用的网络,该网络称为SEQTR,用于视觉接地任务,例如短语本地化,参考表达理解(REC)和分割(RES)。视觉接地的规范范例通常需要在设计网络体系结构和损失功能方面具有丰富的专业知识,从而使它们难以跨越跨任务进行推广。为了简化和统一建模,我们将视觉接地作为点预测问题在图像和文本输入上进行条件,其中边界框或二进制掩码表示为一系列离散坐标令牌。在此范式下,视觉接地任务是在我们的SEQTR网络中统一的,而没有特定于任务的分支或头部,例如RES的卷积蒙版解码器,这大大降低了多任务建模的复杂性。此外,SEQTR还具有简单的交叉渗透损失,共享所有任务的相同优化目标,从而进一步降低了部署手工制作的损失功能的复杂性。五个基准数据集的实验表明,所提出的SEQTR优于现有的最新技术(或与之相提并论),这证明了一种简单而通用的视觉接地方法确实是可行的。源代码可在https://github.com/sean-zhuh/seqtr上获得。
translated by 谷歌翻译
本文提出了一种任何时间的超分辨率方法(ARM),以解决过度参数化的单图像超分辨率(SISR)模型。我们的手臂是由三个观察结果激励的:(1)不同图像贴片的性能随不同大小的SISR网络而变化。 (2)计算开销与重建图像的性能之间存在权衡。 (3)给定输入图像,其边缘信息可以是估计其PSNR的有效选择。随后,我们训练包含不同尺寸的SISR子网的手臂超网,以处理各种复杂性的图像斑块。为此,我们构建了一个边缘到PSNR查找表,该表将图像补丁的边缘分数映射到每个子网的PSNR性能,以及子网的一组计算成本。在推论中,图像贴片单独分配给不同的子网,以获得更好的计算绩效折衷。此外,每个SISR子网都共享手臂超网的权重,因此不引入额外的参数。多个子网的设置可以很好地使SISR模型的计算成本适应动态可用的硬件资源,从而可以随时使用SISR任务。对不同大小的分辨率数据集的广泛实验和流行的SISR网络作为骨架验证了我们的手臂的有效性和多功能性。源代码可在https://github.com/chenbong/arm-net上找到。
translated by 谷歌翻译